dgit.raspbian.org Git - linux.git/log

x86/speculation: Support 'mitigations=' cmdline option

commit d68be4c4d31295ff6ae34a8ddfaa4c1a8ff42812 upstream

Configure x86 runtime CPU speculation bug mitigations in accordance with
the 'mitigations=' cmdline option. This affects Meltdown, Spectre v2,
Speculative Store Bypass, and L1TF.

The default behavior is unchanged.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/6616d0ae169308516cfdf5216bedd169f8a8291b.1555085500.git.jpoimboe@redhat.com
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0024-x86-speculation-Support-mitigations-cmdline-option.patch

cpu/speculation: Add 'mitigations=' cmdline option

commit 98af8452945c55652de68536afdde3b520fec429 upstream

Keeping track of the number of mitigations for all the CPU speculation
bugs has become overwhelming for many users.  It's getting more and more
complicated to decide which mitigations are needed for a given
architecture.  Complicating matters is the fact that each arch tends to
have its own custom way to mitigate the same vulnerability.

Most users fall into a few basic categories:

a) they want all mitigations off;

b) they want all reasonable mitigations on, with SMT enabled even if
   it's vulnerable; or

c) they want all reasonable mitigations on, with SMT disabled if
   vulnerable.

Define a set of curated, arch-independent options, each of which is an
aggregation of existing options:

- mitigations=off: Disable all mitigations.

- mitigations=auto: [default] Enable all the default mitigations, but
  leave SMT enabled, even if it's vulnerable.

- mitigations=auto,nosmt: Enable all the default mitigations, disabling
  SMT if needed by a mitigation.

Currently, these options are placeholders which don't actually do
anything.  They will be fleshed out in upcoming patches.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0023-cpu-speculation-Add-mitigations-cmdline-option.patch

x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off

commit e2c3c94788b08891dcf3dbe608f9880523ecd71b upstream

This code is only for CPUs which are affected by MSBDS, but are *not*
affected by the other two MDS issues.

For such CPUs, enabling the mds_idle_clear mitigation is enough to
mitigate SMT.

However if user boots with 'mds=off' and still has SMT enabled, we should
not report that SMT is mitigated:

$cat /sys//devices/system/cpu/vulnerabilities/mds
Vulnerable; SMT mitigated

But rather:
Vulnerable; SMT vulnerable

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20190412215118.294906495@localhost.localdomain
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0022-x86-speculation-mds-Print-SMT-vulnerable-on-MSBDS-wi.patch

x86/speculation/mds: Fix comment

commit cae5ec342645746d617dd420d206e1588d47768a upstream

s/L1TF/MDS/

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0021-x86-speculation-mds-Fix-comment.patch

x86/speculation/mds: Add SMT warning message

commit 39226ef02bfb43248b7db12a4fdccb39d95318e3 upstream

MDS is vulnerable with SMT. Make that clear with a one-time printk
whenever SMT first gets enabled.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0020-x86-speculation-mds-Add-SMT-warning-message.patch

x86/speculation: Move arch_smt_update() call to after mitigation decisions

commit 7c3658b20194a5b3209a143f63bc9c643c6a3ae2 upstream

arch_smt_update() now has a dependency on both Spectre v2 and MDS
mitigations. Move its initial call to after all the mitigation decisions
have been made.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0019-x86-speculation-Move-arch_smt_update-call-to-after-m.patch

x86/speculation/mds: Add mds=full,nosmt cmdline option

commit d71eb0ce109a124b0fa714832823b9452f2762cf upstream

Add the mds=full,nosmt cmdline option. This is like mds=full, but with
SMT disabled if the CPU is vulnerable.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0018-x86-speculation-mds-Add-mds-full-nosmt-cmdline-optio.patch

Documentation: Add MDS vulnerability documentation

commit 5999bbe7a6ea3c62029532ec84dc06003a1fa258 upstream

Add the initial MDS vulnerability documentation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0017-Documentation-Add-MDS-vulnerability-documentation.patch

Documentation: Move L1TF to separate directory

commit 65fd4cb65b2dad97feb8330b6690445910b56d6a upstream

Move L!TF to a separate directory so the MDS stuff can be added at the
side. Otherwise the all hardware vulnerabilites have their own top level
entry. Should have done that right away.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0016-Documentation-Move-L1TF-to-separate-directory.patch

x86/speculation/mds: Add mitigation mode VMWERV

commit 22dd8365088b6403630b82423cf906491859b65e upstream

In virtualized environments it can happen that the host has the microcode
update which utilizes the VERW instruction to clear CPU buffers, but the
hypervisor is not yet updated to expose the X86_FEATURE_MD_CLEAR CPUID bit
to guests.

Introduce an internal mitigation mode VMWERV which enables the invocation
of the CPU buffer clearing even if X86_FEATURE_MD_CLEAR is not set. If the
system has no updated microcode this results in a pointless execution of
the VERW instruction wasting a few CPU cycles. If the microcode is updated,
but not exposed to a guest then the CPU buffers will be cleared.

That said: Virtual Machines Will Eventually Receive Vaccine

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0015-x86-speculation-mds-Add-mitigation-mode-VMWERV.patch

x86/speculation/mds: Add sysfs reporting for MDS

commit 8a4b06d391b0a42a373808979b5028f5c84d9c6a upstream

Add the sysfs reporting file for MDS. It exposes the vulnerability and
mitigation state similar to the existing files for the other speculative
hardware vulnerabilities.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0014-x86-speculation-mds-Add-sysfs-reporting-for-MDS.patch

x86/speculation/mds: Add mitigation control for MDS

commit bc1241700acd82ec69fde98c5763ce51086269f8 upstream

Now that the mitigations are in place, add a command line parameter to
control the mitigation, a mitigation selector function and a SMT update
mechanism.

This is the minimal straight forward initial implementation which just
provides an always on/off mode. The command line parameter is:

mds=[full|off]

This is consistent with the existing mitigations for other speculative
hardware vulnerabilities.

The idle invocation is dynamically updated according to the SMT state of
the system similar to the dynamic update of the STIBP mitigation. The idle
mitigation is limited to CPUs which are only affected by MSBDS and not any
other variant, because the other variants cannot be mitigated on SMT
enabled systems.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0013-x86-speculation-mds-Add-mitigation-control-for-MDS.patch

x86/speculation/mds: Conditionally clear CPU buffers on idle entry

commit 07f07f55a29cb705e221eda7894dd67ab81ef343 upstream

Add a static key which controls the invocation of the CPU buffer clear
mechanism on idle entry. This is independent of other MDS mitigations
because the idle entry invocation to mitigate the potential leakage due to
store buffer repartitioning is only necessary on SMT systems.

Add the actual invocations to the different halt/mwait variants which
covers all usage sites. mwaitx is not patched as it's not available on
Intel CPUs.

The buffer clear is only invoked before entering the C-State to prevent
that stale data from the idling CPU is spilled to the Hyper-Thread sibling
after the Store buffer got repartitioned and all entries are available to
the non idle sibling.

When coming out of idle the store buffer is partitioned again so each
sibling has half of it available. Now CPU which returned from idle could be
speculatively exposed to contents of the sibling, but the buffers are
flushed either on exit to user space or on VMENTER.

When later on conditional buffer clearing is implemented on top of this,
then there is no action required either because before returning to user
space the context switch will set the condition flag which causes a flush
on the return to user path.

Note, that the buffer clearing on idle is only sensible on CPUs which are
solely affected by MSBDS and not any other variant of MDS because the other
MDS variants cannot be mitigated when SMT is enabled, so the buffer
clearing on idle would be a window dressing exercise.

This intentionally does not handle the case in the acpi/processor_idle
driver which uses the legacy IO port interface for C-State transitions for
two reasons:

- The acpi/processor_idle driver was replaced by the intel_idle driver
   almost a decade ago. Anything Nehalem upwards supports it and defaults
   to that new driver.

- The legacy IO port interface is likely to be used on older and therefore
   unaffected CPUs or on systems which do not receive microcode updates
   anymore, so there is no point in adding that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0012-x86-speculation-mds-Conditionally-clear-CPU-buffers-.patch

x86/kvm/vmx: Add MDS protection when L1D Flush is not active

commit 650b68a0622f933444a6d66936abb3103029413b upstream

CPUs which are affected by L1TF and MDS mitigate MDS with the L1D Flush on
VMENTER when updated microcode is installed.

If a CPU is not affected by L1TF or if the L1D Flush is not in use, then
MDS mitigation needs to be invoked explicitly.

For these cases, follow the host mitigation state and invoke the MDS
mitigation before VMENTER.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0011-x86-kvm-vmx-Add-MDS-protection-when-L1D-Flush-is-not.patch

x86/speculation/mds: Clear CPU buffers on exit to user

commit 04dcbdb8057827b043b3c71aa397c4c63e67d086 upstream

Add a static key which controls the invocation of the CPU buffer clear
mechanism on exit to user space and add the call into
prepare_exit_to_usermode() and do_nmi() right before actually returning.

Add documentation which kernel to user space transition this covers and
explain why some corner cases are not mitigated.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0010-x86-speculation-mds-Clear-CPU-buffers-on-exit-to-use.patch

x86/speculation/mds: Add mds_clear_cpu_buffers()

commit 6a9e529272517755904b7afa639f6db59ddb793e upstream

The Microarchitectural Data Sampling (MDS) vulernabilities are mitigated by
clearing the affected CPU buffers. The mechanism for clearing the buffers
uses the unused and obsolete VERW instruction in combination with a
microcode update which triggers a CPU buffer clear when VERW is executed.

Provide a inline function with the assembly magic. The argument of the VERW
instruction must be a memory operand as documented:

  "MD_CLEAR enumerates that the memory-operand variant of VERW (for
   example, VERW m16) has been extended to also overwrite buffers affected
   by MDS. This buffer overwriting functionality is not guaranteed for the
   register operand variant of VERW."

Documentation also recommends to use a writable data segment selector:

  "The buffer overwriting occurs regardless of the result of the VERW
   permission check, as well as when the selector is null or causes a
   descriptor load segment violation. However, for lowest latency we
   recommend using a selector that indicates a valid writable data
   segment."

Add x86 specific documentation about MDS and the internal workings of the
mitigation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0009-x86-speculation-mds-Add-mds_clear_cpu_buffers.patch

x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests

commit 6c4dbbd14730c43f4ed808a9c42ca41625925c22 upstream

X86_FEATURE_MD_CLEAR is a new CPUID bit which is set when microcode
provides the mechanism to invoke a flush of various exploitable CPU buffers
by invoking the VERW instruction.

Hand it through to guests so they can adjust their mitigations.

This also requires corresponding qemu changes, which are available
separately.

[ tglx: Massaged changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0008-x86-kvm-Expose-X86_FEATURE_MD_CLEAR-to-guests.patch

x86/speculation/mds: Add BUG_MSBDS_ONLY

commit e261f209c3666e842fd645a1e31f001c3a26def9 upstream

This bug bit is set on CPUs which are only affected by Microarchitectural
Store Buffer Data Sampling (MSBDS) and not by any other MDS variant.

This is important because the Store Buffers are partitioned between
Hyper-Threads so cross thread forwarding is not possible. But if a thread
enters or exits a sleep state the store buffer is repartitioned which can
expose data from one thread to the other. This transition can be mitigated.

That means that for CPUs which are only affected by MSBDS SMT can be
enabled, if the CPU is not affected by other SMT sensitive vulnerabilities,
e.g. L1TF. The XEON PHI variants fall into that category. Also the
Silvermont/Airmont ATOMs, but for them it's not really relevant as they do
not support SMT, but mark them for completeness sake.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0007-x86-speculation-mds-Add-BUG_MSBDS_ONLY.patch

x86/speculation/mds: Add basic bug infrastructure for MDS

commit ed5194c2732c8084af9fd159c146ea92bf137128 upstream

Microarchitectural Data Sampling (MDS), is a class of side channel attacks
on internal buffers in Intel CPUs. The variants are:

- Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
- Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
- Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)

MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
dependent load (store-to-load forwarding) as an optimization. The forward
can also happen to a faulting or assisting load operation for a different
memory address, which can be exploited under certain conditions. Store
buffers are partitioned between Hyper-Threads so cross thread forwarding is
not possible. But if a thread enters or exits a sleep state the store
buffer is repartitioned which can expose data from one thread to the other.

MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
L1 miss situations and to hold data which is returned or sent in response
to a memory or I/O operation. Fill buffers can forward data to a load
operation and also write data to the cache. When the fill buffer is
deallocated it can retain the stale data of the preceding operations which
can then be forwarded to a faulting or assisting load operation, which can
be exploited under certain conditions. Fill buffers are shared between
Hyper-Threads so cross thread leakage is possible.

MLDPS leaks Load Port Data. Load ports are used to perform load operations
from memory or I/O. The received data is then forwarded to the register
file or a subsequent operation. In some implementations the Load Port can
contain stale data from a previous operation which can be forwarded to
faulting or assisting loads under certain conditions, which again can be
exploited eventually. Load ports are shared between Hyper-Threads so cross
thread leakage is possible.

All variants have the same mitigation for single CPU thread case (SMT off),
so the kernel can treat them as one MDS issue.

Add the basic infrastructure to detect if the current CPU is affected by
MDS.

[ tglx: Rewrote changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0006-x86-speculation-mds-Add-basic-bug-infrastructure-for.patch

x86/speculation: Consolidate CPU whitelists

commit 36ad35131adacc29b328b9c8b6277a8bf0d6fd5d upstream

The CPU vulnerability whitelists have some overlap and there are more
whitelists coming along.

Use the driver_data field in the x86_cpu_id struct to denote the
whitelisted vulnerabilities and combine all whitelists into one.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0005-x86-speculation-Consolidate-CPU-whitelists.patch

x86/msr-index: Cleanup bit defines

commit d8eabc37310a92df40d07c5a8afc53cebf996716 upstream

Greg pointed out that speculation related bit defines are using (1 << N)
format instead of BIT(N). Aside of that (1 << N) is wrong as it should use
1UL at least.

Clean it up.

[ Josh Poimboeuf: Fix tools build ]

Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0004-x86-msr-index-Cleanup-bit-defines.patch

kvm: x86: Report STIBP on GET_SUPPORTED_CPUID

commit d7b09c827a6cf291f66637a36f46928dd1423184 upstream

Months ago, we have added code to allow direct access to MSR_IA32_SPEC_CTRL
to the guest, which makes STIBP available to guests. This was implemented
by commits d28b387fb74d ("KVM/VMX: Allow direct access to
MSR_IA32_SPEC_CTRL") and b2ac58f90540 ("KVM/SVM: Allow direct access to
MSR_IA32_SPEC_CTRL").

However, we never updated GET_SUPPORTED_CPUID to let userspace know that
STIBP can be enabled in CPUID. Fix that by updating
kvm_cpuid_8000_0008_ebx_x86_features and kvm_cpuid_7_0_edx_x86_features.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0003-kvm-x86-Report-STIBP-on-GET_SUPPORTED_CPUID.patch

x86/cpu: Sanitize FAM6_ATOM naming

commit f2c4db1bd80720cd8cb2a5aa220d9bc9f374f04e upstream

Going primarily by:

  https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors

with additional information gleaned from other related pages; notably:

- Bonnell shrink was called Saltwell
- Moorefield is the Merriefield refresh which makes it Airmont

The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE

  for i in `git grep -l FAM6_ATOM` ; do
sed -i  -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \
-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \
-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \
-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \
-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \
-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \
-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \
-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \
-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \
-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \
-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
  done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dave.hansen@linux.intel.com
Cc: len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0002-x86-cpu-Sanitize-FAM6_ATOM-naming.patch

Documentation/l1tf: Fix small spelling typo

commit 60ca05c3b44566b70d64fbb8e87a6e0c67725468 upstream

Fix small typo (wiil -> will) in the "3.4. Nested virtual machines"
section.

Fixes: 5b76a3cff011 ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry")
Cc: linux-kernel@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-doc@vger.kernel.org
Cc: trivial@kernel.org
Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Gbp-Pq: Topic bugfix/all/spec
Gbp-Pq: Name 0001-Documentation-l1tf-Fix-small-spelling-typo.patch

fs: prevent page refcount overflow in pipe_buf_get

commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream.

Change pipe_buf_get() to return a bool indicating whether it succeeded
in raising the refcount of the page (if the thing in the pipe is a page).
This removes another mechanism for overflowing the page refcount. All
callers converted to handle a failure.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0004-fs-prevent-page-refcount-overflow-in-pipe_buf_get.patch

mm: prevent get_user_pages() from overflowing page refcount

commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream.

If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it. One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times. This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.

Reported-by: Jann Horn <jannh@google.com>
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0003-mm-prevent-get_user_pages-from-overflowing-page-refc.patch

mm: add 'try_get_page()' helper function

commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 upstream.

This is the same as the traditional 'get_page()' function, but instead
of unconditionally incrementing the reference count of the page, it only
does so if the count was "safe". It returns whether the reference count
was incremented (and is marked __must_check, since the caller obviously
has to be aware of it).

Also like 'get_page()', you can't use this function unless you already
had a reference to the page. The intent is that you can use this
exactly like get_page(), but in situations where you want to limit the
maximum reference count.

The code currently does an unconditional WARN_ON_ONCE() if we ever hit
the reference count issues (either zero or negative), as a notification
that the conditional non-increment actually happened.

NOTE! The count access for the "safety" check is inherently racy, but
that doesn't matter since the buffer we use is basically half the range
of the reference count (ie we look at the sign of the count).

Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0002-mm-add-try_get_page-helper-function.patch

mm: make page ref count overflow check tighter and more explicit

commit f958d7b528b1b40c44cfda5eabe2d82760d868c3 upstream.

We have a VM_BUG_ON() to check that the page reference count doesn't
underflow (or get close to overflow) by checking the sign of the count.

That's all fine, but we actually want to allow people to use a "get page
ref unless it's already very high" helper function, and we want that one
to use the sign of the page ref (without triggering this VM_BUG_ON).

Change the VM_BUG_ON to only check for small underflows (or _very_ close
to overflowing), and ignore overflows which have strayed into negative
territory.

Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0001-mm-make-page-ref-count-overflow-check-tighter-and-mo.patch

tracing: Fix buffer_ref pipe ops

commit b987222654f84f7b4ca95b3a55eca784cb30235b upstream.

This fixes multiple issues in buffer_pipe_buf_ops:

- The ->steal() handler must not return zero unless the pipe buffer has
   the only reference to the page. But generic_pipe_buf_steal() assumes
   that every reference to the pipe is tracked by the page's refcount,
   which isn't true for these buffers - buffer_pipe_buf_get(), which
   duplicates a buffer, doesn't touch the page's refcount.
   Fix it by using generic_pipe_buf_nosteal(), which refuses every
   attempted theft. It should be easy to actually support ->steal, but the
   only current users of pipe_buf_steal() are the virtio console and FUSE,
   and they also only use it as an optimization. So it's probably not worth
   the effort.
- The ->get() and ->release() handlers can be invoked concurrently on pipe
   buffers backed by the same struct buffer_ref. Make them safe against
   concurrency by using refcount_t.
- The pointers stored in ->private were only zeroed out when the last
   reference to the buffer_ref was dropped. As far as I know, this
   shouldn't be necessary anyway, but if we do it, let's always do it.

Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name tracing-fix-buffer_ref-pipe-ops.patch

Fix aio_poll() races

commit af5c72b1fc7a00aa484e90b0c4e0eeb582545634 upstream.

aio_poll() has to cope with several unpleasant problems:
* requests that might stay around indefinitely need to
be made visible for io_cancel(2); that must not be done to
a request already completed, though.
* in cases when ->poll() has placed us on a waitqueue,
wakeup might have happened (and request completed) before ->poll()
returns.
* worse, in some early wakeup cases request might end
up re-added into the queue later - we can't treat "woken up and
currently not in the queue" as "it's not going to stick around
indefinitely"
* ... moreover, ->poll() might have decided not to
put it on any queues to start with, and that needs to be distinguished
from the previous case
* ->poll() might have tried to put us on more than one queue.
Only the first will succeed for aio poll, so we might end up missing
wakeups.  OTOH, we might very well notice that only after the
wakeup hits and request gets completed (all before ->poll() gets
around to the second poll_wait()).  In that case it's too late to
decide that we have an error.

req->woken was an attempt to deal with that.  Unfortunately, it was
broken.  What we need to keep track of is not that wakeup has happened -
the thing might come back after that.  It's that async reference is
already gone and won't come back, so we can't (and needn't) put the
request on the list of cancellables.

The easiest case is "request hadn't been put on any waitqueues"; we
can tell by seeing NULL apt.head, and in that case there won't be
anything async.  We should either complete the request ourselves
(if vfs_poll() reports anything of interest) or return an error.

In all other cases we get exclusion with wakeups by grabbing the
queue lock.

If request is currently on queue and we have something interesting
from vfs_poll(), we can steal it and complete the request ourselves.

If it's on queue and vfs_poll() has not reported anything interesting,
we either put it on the cancellable list, or, if we know that it
hadn't been put on all queues ->poll() wanted it on, we steal it and
return an error.

If it's _not_ on queue, it's either been already dealt with (in which
case we do nothing), or there's aio_poll_complete_work() about to be
executed.  In that case we either put it on the cancellable list,
or, if we know it hadn't been put on all queues ->poll() wanted it on,
simulate what cancel would've done.

It's a lot more convoluted than I'd like it to be.  Single-consumer APIs
suck, and unfortunately aio is not an exception...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0014-Fix-aio_poll-races.patch

aio: store event at final iocb_put()

commit 2bb874c0d873d13bd9b9b9c6d7b7c4edab18c8b4 upstream.

Instead of having aio_complete() set ->ki_res.{res,res2}, do that
explicitly in its callers, drop the reference (as aio_complete()
used to do) and delay the rest until the final iocb_put().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0013-aio-store-event-at-final-iocb_put.patch

aio: keep io_event in aio_kiocb

commit a9339b7855094ba11a97e8822ae038135e879e79 upstream.

We want to separate forming the resulting io_event from putting it
into the ring buffer.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0012-aio-keep-io_event-in-aio_kiocb.patch

aio: fold lookup_kiocb() into its sole caller

commit 833f4154ed560232120bc475935ee1d6a20e159f upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0011-aio-fold-lookup_kiocb-into-its-sole-caller.patch

pin iocb through aio.

commit b53119f13a04879c3bf502828d99d13726639ead upstream.

aio_poll() is not the only case that needs file pinned; worse, while
aio_read()/aio_write() can live without pinning iocb itself, the
proof is rather brittle and can easily break on later changes.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0010-pin-iocb-through-aio.patch

aio: simplify - and fix - fget/fput for io_submit()

commit 84c4e1f89fefe70554da0ab33be72c9be7994379 upstream.

Al Viro root-caused a race where the IOCB_CMD_POLL handling of
fget/fput() could cause us to access the file pointer after it had
already been freed:

"In more details - normally IOCB_CMD_POLL handling looks so:

   1) io_submit(2) allocates aio_kiocb instance and passes it to
      aio_poll()

   2) aio_poll() resolves the descriptor to struct file by req->file =
      fget(iocb->aio_fildes)

   3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
      aio_kiocb to 2 (bumps by 1, that is).

   4) aio_poll() calls vfs_poll(). After sanity checks (basically,
      "poll_wait() had been called and only once") it locks the queue.
      That's what the extra reference to iocb had been for - we know we
      can safely access it.

   5) With queue locked, we check if ->woken has already been set to
      true (by aio_poll_wake()) and, if it had been, we unlock the
      queue, drop a reference to aio_kiocb and bugger off - at that
      point it's a responsibility to aio_poll_wake() and the stuff
      called/scheduled by it. That code will drop the reference to file
      in req->file, along with the other reference to our aio_kiocb.

   6) otherwise, we see whether we need to wait. If we do, we unlock the
      queue, drop one reference to aio_kiocb and go away - eventual
      wakeup (or cancel) will deal with the reference to file and with
      the other reference to aio_kiocb

   7) otherwise we remove ourselves from waitqueue (still under the
      queue lock), so that wakeup won't get us. No async activity will
      be happening, so we can safely drop req->file and iocb ourselves.

  If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
  won't get freed under us, so we can do all the checks and locking
  safely. And we don't touch ->file if we detect that case.

  However, vfs_poll() most certainly *does* touch the file it had been
  given. So wakeup coming while we are still in ->poll() might end up
  doing fput() on that file. That case is not too rare, and usually we
  are saved by the still present reference from descriptor table - that
  fput() is not the final one.

  But if another thread closes that descriptor right after our fget()
  and wakeup does happen before ->poll() returns, we are in trouble -
  final fput() done while we are in the middle of a method:

Al also wrote a patch to take an extra reference to the file descriptor
to fix this, but I instead suggested we just streamline the whole file
pointer handling by submit_io() so that the generic aio submission code
simply keeps the file pointer around until the aio has completed.

Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0009-aio-simplify-and-fix-fget-fput-for-io_submit.patch

aio: initialize kiocb private in case any filesystems expect it.

commit ec51f8ee1e63498e9f521ec0e5a6d04622bb2c67 upstream.

A recent optimization had left private uninitialized.

Fixes: 2bc4ca9bb600 ("aio: don't zero entire aio_kiocb aio_get_req()")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0008-aio-initialize-kiocb-private-in-case-any-filesystems.patch

aio: abstract out io_event filler helper

commit 875736bb3f3ded168469f6a14df7a938416a99d5 upstream.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0007-aio-abstract-out-io_event-filler-helper.patch

aio: split out iocb copy from io_submit_one()

commit 88a6f18b950e2e4dce57d31daa151105f4f3dcff upstream.

In preparation of handing in iocbs in a different fashion as well. Also
make it clear that the iocb being passed in isn't modified, by marking
it const throughout.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0006-aio-split-out-iocb-copy-from-io_submit_one.patch

aio: use iocb_put() instead of open coding it

commit 71ebc6fef0f53459f37fb39e1466792232fa52ee upstream.

Replace the percpu_ref_put() + kmem_cache_free() with a call to
iocb_put() instead.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0005-aio-use-iocb_put-instead-of-open-coding-it.patch

aio: don't zero entire aio_kiocb aio_get_req()

commit 2bc4ca9bb600cbe36941da2b2a67189fc4302a04 upstream.

It's 192 bytes, fairly substantial. Most items don't need to be cleared,
especially not upfront. Clear the ones we do need to clear, and leave
the other ones for setup when the iocb is prepared and submitted.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0004-aio-don-t-zero-entire-aio_kiocb-aio_get_req.patch

aio: separate out ring reservation from req allocation

commit 432c79978c33ecef91b1b04cea6936c20810da29 upstream.

This is in preparation for certain types of IO not needing a ring
reserveration.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0003-aio-separate-out-ring-reservation-from-req-allocatio.patch

aio: use assigned completion handler

commit bc9bff61624ac33b7c95861abea1af24ee7a94fc upstream.

We know this is a read/write request, but in preparation for
having different kinds of those, ensure that we call the assigned
handler instead of assuming it's aio_complete_rq().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0002-aio-use-assigned-completion-handler.patch

aio: clear IOCB_HIPRI

commit 154989e45fd8de9bfb52bbd6e5ea763e437e54c5 upstream.

No one is going to poll for aio (yet), so we must clear the HIPRI
flag, as we would otherwise send it down the poll queues, where no
one will be polling for completions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
IOCB_HIPRI, not RWF_HIPRI.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name 0001-aio-clear-IOCB_HIPRI.patch

vfio/type1: Limit DMA mappings per container

Memory backed DMA mappings are accounted against a user's locked
memory limit, including multiple mappings of the same memory. This
accounting bounds the number of such mappings that a user can create.
However, DMA mappings that are not backed by memory, such as DMA
mappings of device MMIO via mmaps, do not make use of page pinning
and therefore do not count against the user's locked memory limit.
These mappings still consume memory, but the memory is not well
associated to the process for the purpose of oom killing a task.

To add bounding on this use case, we introduce a limit to the total
number of concurrent DMA mappings that a user is allowed to create.
This limit is exposed as a tunable module option where the default
value of 64K is expected to be well in excess of any reasonable use
case (a large virtual machine configuration would typically only make
use of tens of concurrent mappings).

This fixes CVE-2019-3882.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name vfio-type1-Limit-DMA-mappings-per-container.patch

ntfs: mark it as broken

NTFS has unfixed issues CVE-2018-12929, CVE-2018-12930, and
CVE-2018-12931. ntfs-3g is a better supported alternative.

Make sure it can't be enabled even in custom kernels.

Gbp-Pq: Topic debian
Gbp-Pq: Name ntfs-mark-it-as-broken.patch

xen/pciback: Don't disable PCI_COMMAND on PCI device reset.

There is no need for this at all. Worst it means that if
the guest tries to write to BARs it could lead (on certain
platforms) to PCI SERR errors.

Please note that with af6fc858a35b90e89ea7a7ee58e66628c55c776b
"xen-pciback: limit guest control of command register"
a guest is still allowed to enable those control bits (safely), but
is not allowed to disable them and that therefore a well behaved
frontend which enables things before using them will still
function correctly.

This is done via an write to the configuration register 0x4 which
triggers on the backend side:
command_write
  \- pci_enable_device
     \- pci_enable_device_flags
        \- do_pci_enable_device
           \- pcibios_enable_device
              \-pci_enable_resourcess
                [which enables the PCI_COMMAND_MEMORY|PCI_COMMAND_IO]

However guests (and drivers) which don't do this could cause
problems, including the security issues which XSA-120 sought
to address.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name xen-pciback-Don-t-disable-PCI_COMMAND-on-PCI-device-.patch

PCI: Set pci=nobios by default

CONFIG_PCI_GOBIOS results in physical addresses 640KB-1MB being mapped
W+X, which is undesirable for security reasons and will result in a
warning at boot now that we enable CONFIG_DEBUG_WX.

This can be overridden using the kernel parameter "pci=nobios", but we
want to disable W+X by default. Disable PCI BIOS probing by default;
it can still be enabled using "pci=bios".

Gbp-Pq: Topic debian
Gbp-Pq: Name i386-686-pae-pci-set-pci-nobios-by-default.patch

MODSIGN: Make shash allocation failure fatal

mod_is_hash_blacklisted() currently returns 0 (suceess) if
crypto_alloc_shash() fails. This should instead be a fatal error,
so unwrap and pass up the error code.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Gbp-Pq: Topic features/all/db-mok-keyring
Gbp-Pq: Name modsign-make-shash-allocation-failure-fatal.patch

MODSIGN: check the attributes of db and mok

That's better for checking the attributes of db and mok variables
before loading certificates to kernel keyring.

For db and dbx, both of them are authenticated variables. Which
means that they can only be modified by manufacturer's key. So
the kernel should checks EFI_VARIABLE_TIME_BASED_AUTHENTICATED_WRITE_ACCESS
attribute before we trust it.

For mok-rt and mokx-rt, both of them are created by shim boot loader
to forward the mok/mokx content to runtime. They must be runtime-volatile
variables. So kernel should checks that the attributes map did not set
EFI_VARIABLE_NON_VOLATILE bit before we trust it.

Cc: David Howells <dhowells@redhat.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com>
[Rebased by Luca Boccassi]

Gbp-Pq: Topic features/all/db-mok-keyring
Gbp-Pq: Name 0004-MODSIGN-check-the-attributes-of-db-and-mok.patch

MODSIGN: checking the blacklisted hash before loading a kernel module

This patch adds the logic for checking the kernel module's hash
base on blacklist. The hash must be generated by sha256 and enrolled
to dbx/mokx.

For example:
sha256sum sample.ko
mokutil --mokx --import-hash $HASH_RESULT

Whether the signature on ko file is stripped or not, the hash can be
compared by kernel.

Cc: David Howells <dhowells@redhat.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com>
[Rebased by Luca Boccassi]

Gbp-Pq: Topic features/all/db-mok-keyring
Gbp-Pq: Name 0003-MODSIGN-checking-the-blacklisted-hash-before-loading-a-kernel-module.patch

MODSIGN: load blacklist from MOKx

This patch adds the logic to load the blacklisted hash and
certificates from MOKx which is maintained by shim bootloader.

Cc: David Howells <dhowells@redhat.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com>
[Rebased by Luca Boccassi]

Gbp-Pq: Topic features/all/db-mok-keyring
Gbp-Pq: Name 0002-MODSIGN-load-blacklist-from-MOKx.patch

MODSIGN: do not load mok when secure boot disabled

The mok can not be trusted when the secure boot is disabled. Which
means that the kernel embedded certificate is the only trusted key.

Due to db/dbx are authenticated variables, they needs manufacturer's
KEK for update. So db/dbx are secure when secureboot disabled.

Cc: David Howells <dhowells@redhat.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com>
[Rebased by Luca Boccassi]

Gbp-Pq: Topic features/all/db-mok-keyring
Gbp-Pq: Name 0001-MODSIGN-do-not-load-mok-when-secure-boot-disabled.patch

modsign: use all trusted keys to verify module signature

Make mod_verify_sig to use all trusted keys. This allows keys in
secondary_trusted_keys to be used to verify PKCS#7 signature on a
kernel module.

Signed-off-by: Ke Wu <mikewu@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Gbp-Pq: Topic features/all/db-mok-keyring
Gbp-Pq: Name 0007-modsign-Use-secondary-trust-keyring-for-module-signi.patch

Make get_cert_list() not complain about cert lists that aren't present.

Signed-off-by: Peter Jones <pjones@redhat.com>
Gbp-Pq: Topic features/all/db-mok-keyring
Gbp-Pq: Name 0006-Make-get_cert_list-not-complain-about-cert-lists-tha.patch

MODSIGN: Allow the "db" UEFI variable to be suppressed

If a user tells shim to not use the certs/hashes in the UEFI db variable
for verification purposes, shim will set a UEFI variable called
MokIgnoreDB. Have the uefi import code look for this and ignore the db
variable if it is found.

Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Gbp-Pq: Topic features/all/db-mok-keyring
Gbp-Pq: Name 0005-MODSIGN-Allow-the-db-UEFI-variable-to-be-suppressed.patch

MODSIGN: Import certificates from UEFI Secure Boot

Secure Boot stores a list of allowed certificates in the 'db' variable.
This imports those certificates into the system trusted keyring.  This
allows for a third party signing certificate to be used in conjunction
with signed modules.  By importing the public certificate into the 'db'
variable, a user can allow a module signed with that certificate to
load.  The shim UEFI bootloader has a similar certificate list stored
in the 'MokListRT' variable.  We import those as well.

Secure Boot also maintains a list of disallowed certificates in the 'dbx'
variable.  We load those certificates into the newly introduced system
blacklist keyring and forbid any module signed with those from loading and
forbid the use within the kernel of any key with a matching hash.

This facility is enabled by setting CONFIG_LOAD_UEFI_KEYS.

Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Gbp-Pq: Topic features/all/db-mok-keyring
Gbp-Pq: Name 0004-MODSIGN-Import-certificates-from-UEFI-Secure-Boot.patch

efi: Add an EFI signature blob parser

Add a function to parse an EFI signature blob looking for elements of
interest. A list is made up of a series of sublists, where all the
elements in a sublist are of the same type, but sublists can be of
different types.

For each sublist encountered, the function pointed to by the
get_handler_for_guid argument is called with the type specifier GUID and
returns either a pointer to a function to handle elements of that type or
NULL if the type is not of interest.

If the sublist is of interest, each element is passed to the handler
function in turn.

Signed-off-by: David Howells <dhowells@redhat.com>
Gbp-Pq: Topic features/all/db-mok-keyring
Gbp-Pq: Name 0003-efi-Add-an-EFI-signature-blob-parser.patch

efi: Add EFI signature data types

Add the data types that are used for containing hashes, keys and
certificates for cryptographic verification along with their corresponding
type GUIDs.

Signed-off-by: David Howells <dhowells@redhat.com>
Gbp-Pq: Topic features/all/db-mok-keyring
Gbp-Pq: Name 0002-efi-Add-EFI-signature-data-types.patch

KEYS: Allow unrestricted boot-time addition of keys to secondary keyring

Allow keys to be added to the system secondary certificates keyring during
kernel initialisation in an unrestricted fashion. Such keys are implicitly
trusted and don't have their trust chains checked on link.

This allows keys in the UEFI database to be added in secure boot mode for
the purposes of module signing.

Signed-off-by: David Howells <dhowells@redhat.com>
Gbp-Pq: Topic features/all/db-mok-keyring
Gbp-Pq: Name 0001-KEYS-Allow-unrestricted-boot-time-addition-of-keys-t.patch

lockdown: Refer to Debian wiki until manual page exists

The lockdown denial log message currently refers to a
"kernel_lockdown.7" manual page, which is supposed to document it.
That manual page hasn't been accepted by the man-pages project and
doesn't even seem to have been submitted yet. For now, refer to the
Debian wiki.

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name lockdown-refer-to-debian-wiki-until-manual-page-exists.patch

arm64: add kernel config option to lock down when in Secure Boot mode

Add a kernel configuration option to lock down the kernel, to restrict
userspace's ability to modify the running kernel when UEFI Secure Boot is
enabled. Based on the x86 patch by Matthew Garrett.

Determine the state of Secure Boot in the EFI stub and pass this to the
kernel using the FDT.

Signed-off-by: Linn Crosetto <linn@hpe.com>
[bwh: Forward-ported to 4.10: adjust context]
[Lukas Wunner: Forward-ported to 4.11: drop parts applied upstream]
[bwh: Forward-ported to 4.15 and lockdown patch set:
- Pass result of efi_get_secureboot() in stub through to
efi_set_secure_boot() in main kernel
- Use lockdown API and naming]
[bwh: Forward-ported to 4.19.3: adjust context in update_fdt()]
[dannf: Moved init_lockdown() call after uefi_init(), fixing SB detection]

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name arm64-add-kernel-config-option-to-lock-down-when.patch

mtd: Disable slram and phram when locked down

The slram and phram drivers both allow mapping regions of physical
address space such that they can then be read and written by userland
through the MTD interface. This is probably usable to manipulate
hardware into overwriting kernel code on many systems. Prevent that
if locked down.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name mtd-disable-slram-and-phram-when-locked-down.patch

Enable cold boot attack mitigation

[Lukas Wunner: Forward-ported to 4.11: adjust context]

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name enable-cold-boot-attack-mitigation.patch

efi: Lock down the kernel if booted in secure boot mode

UEFI Secure Boot provides a mechanism for ensuring that the firmware will
only load signed bootloaders and kernels. Certain use cases may also
require that all kernel modules also be signed. Add a configuration option
that to lock down the kernel - which includes requiring validly signed
modules - if the kernel is secure-booted.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
cc: linux-efi@vger.kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0029-efi-Lock-down-the-kernel-if-booted-in-secure-boot-mo.patch

efi: Add an EFI_SECURE_BOOT flag to indicate secure boot mode

UEFI machines can be booted in Secure Boot mode. Add an EFI_SECURE_BOOT
flag that can be passed to efi_enabled() to find out whether secure boot is
enabled.

Move the switch-statement in x86's setup_arch() that inteprets the
secure_boot boot parameter to generic code and set the bit there.

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
cc: linux-efi@vger.kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0028-efi-Add-an-EFI_SECURE_BOOT-flag-to-indicate-secure-b.patch

bpf: Restrict kernel image access functions when the kernel is locked down

There are some bpf functions can be used to read kernel memory:
bpf_probe_read, bpf_probe_write_user and bpf_trace_printk. These allow
private keys in kernel memory (e.g. the hibernation image signing key) to
be read by an eBPF program and kernel memory to be altered without
restriction.

Completely prohibit the use of BPF when the kernel is locked down.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: netdev@vger.kernel.org
cc: Chun-Yi Lee <jlee@suse.com>
cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
[bwh: Adjust context to apply after commit dcab51f19b29
"bpf: Expose check_uarg_tail_zero()"]

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0027-bpf-Restrict-kernel-image-access-functions-when-the-.patch

Lock down kprobes

Disallow the creation of kprobes when the kernel is locked down by
preventing their registration. This prevents kprobes from being used to
access kernel memory, either to make modifications or to steal crypto data.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0026-Lock-down-kprobes.patch

Lock down /proc/kcore

Disallow access to /proc/kcore when the kernel is locked down to prevent
access to cryptographic data.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0025-Lock-down-proc-kcore.patch

debugfs: Disallow use of debugfs files when the kernel is locked down

Disallow opening of debugfs files when the kernel is locked down as various
drivers give raw access to hardware through debugfs.

Accesses to tracefs should use /sys/kernel/tracing/ rather than
/sys/kernel/debug/tracing/. Possibly a symlink should be emplaced.

Normal device interaction should be done through configfs or a miscdev, not
debugfs.

Note that this makes it unnecessary to specifically lock down show_dsts(),
show_devs() and show_call() in the asus-wmi driver.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Andy Shevchenko <andy.shevchenko@gmail.com>
cc: acpi4asus-user@lists.sourceforge.net
cc: platform-driver-x86@vger.kernel.org
cc: Matthew Garrett <matthew.garrett@nebula.com>
cc: Thomas Gleixner <tglx@linutronix.de>
[bwh: Forward-ported to 4.15]

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0024-debugfs-Disallow-use-of-debugfs-files-when-the-kerne.patch

x86/mmiotrace: Lock down the testmmiotrace module

The testmmiotrace module shouldn't be permitted when the kernel is locked
down as it can be used to arbitrarily read and write MMIO space.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com
cc: Thomas Gleixner <tglx@linutronix.de>
cc: Steven Rostedt <rostedt@goodmis.org>
cc: Ingo Molnar <mingo@kernel.org>
cc: "H. Peter Anvin" <hpa@zytor.com>
cc: x86@kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0023-x86-mmiotrace-Lock-down-the-testmmiotrace-module.patch

Lock down module params that specify hardware parameters (eg. ioport)

Provided an annotation for module parameters that specify hardware
parameters (such as io ports, iomem addresses, irqs, dma channels, fixed
dma buffers and other types).

Suggested-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0022-Lock-down-module-params-that-specify-hardware-parame.patch

Lock down TIOCSSERIAL

Lock down TIOCSSERIAL as that can be used to change the ioport and irq
settings on a serial port. This only appears to be an issue for the serial
drivers that use the core serial code. All other drivers seem to either
ignore attempts to change port/irq or give an error.

Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jiri Slaby <jslaby@suse.com>

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0021-Lock-down-TIOCSSERIAL.patch

Prohibit PCMCIA CIS storage when the kernel is locked down

Prohibit replacement of the PCMCIA Card Information Structure when the
kernel is locked down.

Suggested-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-pcmcia@lists.infradead.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0020-Prohibit-PCMCIA-CIS-storage-when-the-kernel-is-locke.patch

acpi: Disable APEI error injection if the kernel is locked down

ACPI provides an error injection mechanism, EINJ, for debugging and testing
the ACPI Platform Error Interface (APEI) and other RAS features. If
supported by the firmware, ACPI specification 5.0 and later provide for a
way to specify a physical memory address to which to inject the error.

Injecting errors through EINJ can produce errors which to the platform are
indistinguishable from real hardware errors. This can have undesirable
side-effects, such as causing the platform to mark hardware as needing
replacement.

While it does not provide a method to load unauthenticated privileged code,
the effect of these errors may persist across reboots and affect trust in
the underlying hardware, so disable error injection through EINJ if
the kernel is locked down.

Signed-off-by: Linn Crosetto <linn@hpe.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
cc: linux-acpi@vger.kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0018-acpi-Disable-APEI-error-injection-if-the-kernel-is-l.patch

acpi: Disable ACPI table override if the kernel is locked down

From the kernel documentation (initrd_table_override.txt):

  If the ACPI_INITRD_TABLE_OVERRIDE compile option is true, it is possible
  to override nearly any ACPI table provided by the BIOS with an
  instrumented, modified one.

When securelevel is set, the kernel should disallow any unauthenticated
changes to kernel space.  ACPI tables contain code invoked by the kernel,
so do not allow ACPI tables to be overridden if the kernel is locked down.

Signed-off-by: Linn Crosetto <linn@hpe.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
cc: linux-acpi@vger.kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0017-acpi-Disable-ACPI-table-override-if-the-kernel-is-lo.patch

acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down

This option allows userspace to pass the RSDP address to the kernel, which
makes it possible for a user to modify the workings of hardware . Reject
the option when the kernel is locked down.

Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
cc: Dave Young <dyoung@redhat.com>
cc: linux-acpi@vger.kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0016-acpi-Ignore-acpi_rsdp-kernel-param-when-the-kernel-h.patch

ACPI: Limit access to custom_method when the kernel is locked down

custom_method effectively allows arbitrary access to system memory, making
it possible for an attacker to circumvent restrictions on module loading.
Disable it if the kernel is locked down.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
cc: linux-acpi@vger.kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0015-ACPI-Limit-access-to-custom_method-when-the-kernel-i.patch

asus-wmi: Restrict debugfs interface when the kernel is locked down

We have no way of validating what all of the Asus WMI methods do on a given
machine - and there's a risk that some will allow hardware state to be
manipulated in such a way that arbitrary code can be executed in the
kernel, circumventing module loading restrictions. Prevent that if the
kernel is locked down.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
cc: acpi4asus-user@lists.sourceforge.net
cc: platform-driver-x86@vger.kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0014-asus-wmi-Restrict-debugfs-interface-when-the-kernel-.patch

x86/msr: Restrict MSR access when the kernel is locked down

Writing to MSRs should not be allowed if the kernel is locked down, since
it could lead to execution of arbitrary code in kernel mode. Based on a
patch by Kees Cook.

MSR accesses are logged for the purposes of building up a whitelist as per
Alan Cox's suggestion.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
cc: x86@kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0013-x86-msr-Restrict-MSR-access-when-the-kernel-is-locke.patch

x86: Lock down IO port access when the kernel is locked down

IO port access would permit users to gain access to PCI configuration
registers, which in turn (on a lot of hardware) give access to MMIO
register space. This would potentially permit root to trigger arbitrary
DMA, so lock it down by default.

This also implicitly locks down the KDADDIO, KDDELIO, KDENABIO and
KDDISABIO console ioctls.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
cc: x86@kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0012-x86-Lock-down-IO-port-access-when-the-kernel-is-lock.patch

PCI: Lock down BAR access when the kernel is locked down

Any hardware that can potentially generate DMA has to be locked down in
order to avoid it being possible for an attacker to modify kernel code,
allowing them to circumvent disabled module loading or module signing.
Default to paranoid - in future we can potentially relax this for
sufficiently IOMMU-isolated devices.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
cc: linux-pci@vger.kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0011-PCI-Lock-down-BAR-access-when-the-kernel-is-locked-d.patch

uswsusp: Disable when the kernel is locked down

uswsusp allows a user process to dump and then restore kernel state, which
makes it possible to modify the running kernel. Disable this if the kernel
is locked down.

Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
cc: linux-pm@vger.kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0010-uswsusp-Disable-when-the-kernel-is-locked-down.patch

hibernate: Disable when the kernel is locked down

There is currently no way to verify the resume image when returning
from hibernate. This might compromise the signed modules trust model,
so until we can work with signed hibernate images we disable it when the
kernel is locked down.

Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
cc: linux-pm@vger.kernel.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0009-hibernate-Disable-when-the-kernel-is-locked-down.patch

kexec_file: Restrict at runtime if the kernel is locked down

When KEXEC_VERIFY_SIG is not enabled, kernel should not load images through
kexec_file systemcall if the kernel is locked down unless IMA can be used
to validate the image.

This code was showed in Matthew's patch but not in git:
https://lkml.org/lkml/2015/3/13/778

Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Chun-Yi Lee <jlee@suse.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
cc: kexec@lists.infradead.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0008-kexec_file-Restrict-at-runtime-if-the-kernel-is-lock.patch

Copy secure_boot flag in boot params across kexec reboot

Kexec reboot in case secure boot being enabled does not keep the secure
boot mode in new kernel, so later one can load unsigned kernel via legacy
kexec_load. In this state, the system is missing the protections provided
by secure boot.

Adding a patch to fix this by retain the secure_boot flag in original
kernel.

secure_boot flag in boot_params is set in EFI stub, but kexec bypasses the
stub. Fixing this issue by copying secure_boot flag across kexec reboot.

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
cc: kexec@lists.infradead.org

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0007-Copy-secure_boot-flag-in-boot-params-across-kexec-re.patch

kexec: Disable at runtime if the kernel is locked down

kexec permits the loading and execution of arbitrary code in ring 0, which
is something that lock-down is meant to prevent. It makes sense to disable
kexec in this situation.

This does not affect kexec_file_load() which can check for a signature on the
image to be booted.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
cc: kexec@lists.infradead.org
[bwh: Adjust context to apply after commit a210fd32a46b
"kexec: add call to LSM hook in original kexec_load syscall"]

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0006-kexec-Disable-at-runtime-if-the-kernel-is-locked-dow.patch

Restrict /dev/{mem,kmem,port} when the kernel is locked down

Allowing users to read and write to core kernel memory makes it possible
for the kernel to be subverted, avoiding module loading restrictions, and
also to steal cryptographic information.

Disallow /dev/mem and /dev/kmem from being opened this when the kernel has
been locked down to prevent this.

Also disallow /dev/port from being opened to prevent raw ioport access and
thus DMA from being used to accomplish the same thing.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0005-Restrict-dev-mem-kmem-port-when-the-kernel-is-locked.patch

Enforce module signatures if the kernel is locked down

If the kernel is locked down, require that all modules have valid
signatures that we can verify or that IMA can validate the file.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
[bwh: Adjust context to apply after commits 2c8fd268f418
"module: Do not access sig_enforce directly" and 5fdc7db6448a
"module: setup load info before module_sig_check()"]

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0004-Enforce-module-signatures-if-the-kernel-is-locked-do.patch

ima: require secure_boot rules in lockdown mode

Require the "secure_boot" rules, whether or not it is specified
on the boot command line, for both the builtin and custom policies
in secure boot lockdown mode.

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
[bwh: Adjust context to apply after commits 6f0911a666d1
"ima: fix updating the ima_appraise flag" and ef96837b0de4
"ima: add build time policy"]

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0003-ima-require-secure_boot-rules-in-lockdown-mode.patch

Add a SysRq option to lift kernel lockdown

Make an option to provide a sysrq key that will lift the kernel lockdown,
thereby allowing the running kernel image to be accessed and modified.

On x86 this is triggered with SysRq+x, but this key may not be available on
all arches, so it is set by setting LOCKDOWN_LIFT_KEY in asm/setup.h.
Since this macro must be defined in an arch to be able to use this facility
for that arch, the Kconfig option is restricted to arches that support it.

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: x86@kernel.org
[bwh: Forward-ported to 4.15]

Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0002-Add-a-SysRq-option-to-lift-kernel-lockdown.patch

Add the ability to lock down access to the running kernel image

Provide a single call to allow kernel code to determine whether the system
should be locked down, thereby disallowing various accesses that might
allow the running kernel image to be changed including the loading of
modules that aren't validly signed with a key we recognise, fiddling with
MSR registers and disallowing hibernation,

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Gbp-Pq: Topic features/all/lockdown
Gbp-Pq: Name 0001-Add-the-ability-to-lock-down-access-to-the-running-k.patch

Revert "net: stmmac: Send TSO packets always from Queue 0"

This reverts commit 496eaed7fe94df7202d7cbe37873f96bcdda375e, which
was commit c5acdbee22a1b200dde07effd26fd1f649e9ab8a upstream. This
introduces data races.

Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name revert-net-stmmac-send-tso-packets-always-from-queue.patch

mt76: Use the correct hweight8() function

mt76_init_stream_cap() and mt76_get_txpower() call __sw_hweight8()
directly, but that's only defined if CONFIG_GENERIC_HWEIGHT is
enabled. The function that works on all architectures is hweight8().

Fixes: 551e1ef4d291 ("mt76: add mt76_init_stream_cap routine")
Fixes: 9313faacbb4e ("mt76: move mt76x02_get_txpower to mt76 core")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[bwh: For 4.19, drop change in mt76_get_txpower()]

Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name mt76-use-the-correct-hweight8-function.patch

Revert "objtool: Fix CONFIG_STACK_VALIDATION=y warning for out-of-tree modules"

This reverts commit 9f0c18aec620bc9d82268b3cb937568dd07b43ff. This
check doesn't make sense for OOT modules as they should always use
a pre-built objtool.

Gbp-Pq: Topic debian
Gbp-Pq: Name revert-objtool-fix-config_stack_validation-y-warning.patch

Kbuild.include: addtree: Remove quotes before matching path

systemtap currently fails to build modules when the kernel source and
object trees are separate.

systemtap adds something like -I"/usr/share/systemtap/runtime" to
EXTRA_CFLAGS, and addtree should not adjust this as it's specifying an
absolute directory. But since make has no understanding of shell
quoting, it does anyway.

For a long time this didn't matter, because addtree would still emit
the original -I option after the adjusted one. However, commit
db547ef19064 ("Kbuild: don't add obj tree in additional includes")
changed it to remove the original -I option.

Remove quotes (both double and single) before matching against the
excluded patterns.

References: https://bugs.debian.org/856474
Reported-by: Jack Henschel <jackdev@mailbox.org>
Reported-by: Ritesh Raj Sarraf <rrs@debian.org>
Fixes: db547ef19064 ("Kbuild: don't add obj tree in additional includes")
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name kbuild-include-addtree-remove-quotes-before-matching-path.patch

Partially revert "usb: Kconfig: using select for USB_COMMON dependency"

This reverts commit cb9c1cfc86926d0e86d19c8e34f6c23458cd3478 for
USB_LED_TRIG. This config symbol has bool type and enables extra code
in usb_common itself, not a separate driver. Enabling it should not
force usb_common to be built-in!

Fixes: cb9c1cfc8692 ("usb: Kconfig: using select for USB_COMMON dependency")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name partially-revert-usb-kconfig-using-select-for-usb_co.patch

fs: Add MODULE_SOFTDEP declarations for hard-coded crypto drivers

This helps initramfs builders and other tools to find the full
dependencies of a module.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[Lukas Wunner: Forward-ported to 4.11: drop parts applied upstream]

Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name fs-add-module_softdep-declarations-for-hard-coded-cr.patch

phy/marvell: disable 4-port phys

The Marvell PHY was originally disabled because it can cause networking
failures on some systems. According to Lennert Buytenhek this is because some
of the variants added did not share the same register layout. Since the known
cases are all 4-ports disable those variants (indicated by a 4 in the
penultimate position of the model name) until they can be audited for
correctness.

[bwh: Also #if-out the init functions for these PHYs to avoid
compiler warnings]

Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name disable-some-marvell-phys.patch

kbuild: Use -nostdinc in compile tests

gcc 4.8 and later include <stdc-predef.h> by default.  In some
versions of eglibc that includes <bits/predefs.h>, but that may be
missing when building with a biarch compiler.  Also <stdc-predef.h>
itself could be missing as we are only trying to build a kernel, not
userland.

The -nostdinc option disables this, though it isn't explicitly
documented.  This option is already used when actually building
the kernel, but not by cc-option and other tests.  This can result
in silently miscompiling the kernel.

References: https://bugs.debian.org/717557
References: https://bugs.debian.org/726861
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Gbp-Pq: Topic bugfix/all
Gbp-Pq: Name kbuild-use-nostdinc-in-compile-tests.patch

arm64: dts: allwinner: a64: Add Pine64-LTS device tree file

The Pine64-LTS is a variant of the Pine64 board, from the software
visible side resembling a SoPine module on a baseboard, though the
board has the SoC and DRAM integrated on one PCB.
Due to this it basically shares the DT with the SoPine baseboard, which
we mimic in our DT by inclucing the boardboard .dts into the new file,
just overwriting the model name.
Having a separate .dts for this seems useful, since we don't know yet if
there are subtle differences between the two. Also the SoC on the LTS
board is technically an "R18" instead of the original "A64", although as
far as we know this is just a relabelled version of the original SoC.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Gbp-Pq: Topic features/arm64
Gbp-Pq: Name arm64-dts-allwinner-a64-Add-Pine64-LTS-device-tree-f.patch